Informative Plots

Time Series

If we see a trend, the time series is not stationary (i.e. does not depend on the time of the observation).

Seasonal plot

If we see a seasonality, the time series is not stationary (i.e. does not depend on the time of the observation).

Elizabeth_II

Je sais pas pk ca tourne pdt mille ans sans voir le résultat

  • time series decompostion

Autocorrelation plot

Autocorrelation quantifies the relationship between lagged values of a time series.

Modified Series

\[ R_i = \frac{X_i - X_{i-7}}{X_{i-7}} * 100 \] \(R_i\) is the modified time series where each series (above \(R_8\)) of pageviews is the relative percentage change time series between one observation at time \(i\) and the seventh before \(i-7\) ?

2016_Summer_Olympics

To compare the time series before and after modification

Peaks-Over-Threshold

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

## $y
## [1] "Latitude of Seismic Events"
## 
## attr(,"class")
## [1] "labels"

Suitability of POT: - Princess Margaret, United Kingdom, United States (?), Wiston Churchill do not seem to be suitable for POT because of the too few numbers of exceedances.

library(evd)
# data frame for the 99 quantile and measure of uncertainty for all the type

data99 <- data.frame(matrix(0, nrow = 2, ncol = length(unique(ts$type))))
colnames(data99) <- unique(ts$type)


for (i in 1:ncol(data99)){

  # filter for the type
   ts_type <- ts %>% 
   filter(type == names(data99)[i]) 
 
   # remove na
 ts_type <- ts_type %>% 
   filter(!is.na(`daily count modified`))
 
 # compute 99 quantile
 quantile99 <- quantile(ts_type$`daily count modified` , 0.99)
 
 # save the quantile in the data.frame
 data99[1,i] <- quantile99 
 
 # measure of uncertainty
 
 # not sure about the mper argument
 # doc of the function here : https://www.rdocumentation.org/packages/evd/versions/2.3-3/topics/fpot
 uncertainty <- fpot(ts_type$`daily count modified`, threshold = thresholds[i], mper = quantile99)
 
 # not sure if we need to save the r level or shape
 data99[2,i] <- uncertainty$std.err[1]
}
## Warning in fpot.quantile(x = x, threshold = threshold, start = start, npp =
## npp, : optimization may not have succeeded

## Warning in fpot.quantile(x = x, threshold = threshold, start = start, npp =
## npp, : optimization may not have succeeded

## Warning in fpot.quantile(x = x, threshold = threshold, start = start, npp =
## npp, : optimization may not have succeeded

## Warning in fpot.quantile(x = x, threshold = threshold, start = start, npp =
## npp, : optimization may not have succeeded

Detecting Simultaneous High Load

for detecting simultaneous high load across the 12 series provided, Which pages seem to have simultaneous high load?

# voir module 4

library(extRemes)
## Loading required package: Lmoments
## Loading required package: distillery
## 
## Attaching package: 'extRemes'
## The following objects are masked from 'package:evd':
## 
##     fbvpot, mrlplot
## The following objects are masked from 'package:stats':
## 
##     qqnorm, qqplot
# idea for graphical representation : block maxima by week colored by type
# https://rdrr.io/cran/extRemes/man/blockmaxxer.html
tsnona <- ts%>% filter(!is.na(`daily count modified`))

# compute block maxima
bm <- blockmaxxer(tsnona, blocks = tsnona$date, which="daily count modified")

library(plotly)

c <-ggplot(tsnona, aes(x=date, y=`daily count modified`)) + geom_point(aes(color = type)) + geom_point(data=bm,aes(date,`daily count modified`, fill = type), colour = "lightpink1") 

ggplotly(c)
library(plotly)
# numerical method 
# GDP model ?
#https://rdrr.io/cran/evir/man/gpd.html
library(evir)
## 
## Attaching package: 'evir'
## The following object is masked from 'package:extRemes':
## 
##     decluster
## The following objects are masked from 'package:evd':
## 
##     dgev, dgpd, pgev, pgpd, qgev, qgpd, rgev, rgpd
## The following object is masked from 'package:ggplot2':
## 
##     qplot
modified_NoNA <- modified_ts %>% filter(!is.na(`daily count modified`))

gpd.model <- gpd(modified_NoNA$`daily count modified`, threshold = mean(thresholds))

gpd.plot <- tailplot(gpd.model)

#gpd.sf <- gpd.sfall(gpd.plot,0.99)